arXiv:1503.03673vl [math.ST] 12 Mar 2015 


Functional Inverse Regression in an Enlarged 
Dimension Reduction Space 


Ting-Li Chen 1 , Su-Yun Huang 1 , Yanyuan Ma 2 and I-Ping Tu 1 

1 Institute of Statistical Science 
Academia Sinica 
Taipei 11529, Taiwan 

email: tlchen, syhuang, iping@stat.sinica.edu.tw 

2 Department of Statistics 
The University of South Carolina 
Columbia, SC 29208 
email: yanyuanma@stat.sc.edu 

Abstract: We consider an enlarged dimension reduction space in functional inverse 
regression. Our operator and functional analysis based approach facilitates a com¬ 
pact and rigorous formulation of the functional inverse regression problem. It also 
enables us to expand the possible space where the dimension reduction functions 
belong. Our formulation provides a unified framework so that the classical notions, 
such as covariance standardization, Mahalanobis distance, SIR and linear discrim¬ 
inant analysis, can be naturally and smoothly carried out in our enlarged space. 
This enlarged dimension reduction space also links to the linear discriminant space 
of Gaussian measures on a separable Hilbert space. 
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1 Introduction 


Traditionally, sufficient dimension reduction problems refer to the estimation of the space 
spanned by the columns of (3 , where (3 satisfies Y JL X | /3 T X. Here, X is a p-dimensional 
covariate vector, which we assume to satisfy EfX.) = 0 for simplicity, (3 is a p x d matrix and Y 
is a univariate response variable. An equivalent form is Y = /(/3 T X, e), where e is a mean zero 
random variable independent of X. By far the most well known procedure of estimation in this 
problem is sliced inverse regression (SIR, Li ( 1991 )). where solving the leading d eigenvectors 


of the generalized eigenvalue problem T e v = ATv is all one needs to do to obtain the column 



space of (3. Here T = cov(X) and T e = cov{iZ(X | Y)}. SIR is constructed under a linearity 
condition which requires E(X. | /3 T X) = T (3((3 T T (3)^ 1 /3 T X and is then further developed into 
a whole class of inverse regression based methods for dimension reduction. To understand 
the inverse regression based methods from a different angle, we can normalize the covariates 
through viewing Z = T _1 / 2 X as new covariates and 77 = Y 1 ^ 2 j3 as new dimension reduction 
matrix. Considering the dimension reduction problem in terms of (Z, Y, rj) instead of (X, Y (3) 


enables muc 


i simp lification and permits clearer exhibition of the critical operations Li (119911 1; 


Ma and Zhu (2012). 


n 


Dimension reduction problems have been exte nded from the traditional regression domain 
to the functional data analysis domain. See 1 Jiang et al.l (20141) and references therein. The 
model considered in the functional dimension reduction framework is 

Y = fm,X) L2 ,...,(p d ,X) L2 ,e) : 

where Y is still a univariate response variable, X is now a covariate function, /3i,...,/3d are 
parameter functions in L 2 (I), and ( , )l 2 denotes the inner product of two functions in the 
L 2 (I) space. Since the matrix vector product /3 T X in the traditional case can be expressed as 
(l3jX. ,... ,/3jX) T which can also be viewed as a vector of inner products between the vector 
/3fc and the covariate X, one might think that the extension to the functional data framework 
is straightforward. However, there are many subtleties when finite dimensional quantities are 
extended to infinite dimensional ones, such as (3\,(3d and X to (3\ (•),..., (3d(-) and X(-). 
Some properties we take for granted in finite dimension may not hold automatically, e.g., some 
vector norm or the inner product between vectors may not be finite. If we want to perform 
the similar standardization as in the finite dimensional case by forming Z(-) = T _1 / 2 A"(-) as 
new covariate function, and ijj(-) = r 1//2 /J,-(-) as new dimension reduction functions for the 
functional correspondence of the variance-covariance matrix (operator) T, not only do we need 
to consider the extensions from vectors to functions and matrices to operators, but also to 
define a proper nornred spaces and their corresponding requirements on these functions. A 
careful and rigorous consideration of these issues will enable less restrictive models and more 
flexible estimation. In fact, one of the main messages of this article is to point out that the 
requirement of the parameter functions /!,■(•)’ s being L 2 (I) ini J iang et al.l (2014) is too strong 
and can be relaxed to include more interesting examples. 

During the process of our investigation, we also realize that it is crucial to formulate 
the functional dimension reduction problem properly in order to facilitate the subsequent 
application of the existing mathematical tools from functional analysis involving Reproducing 
Kernel Hilbert Space (RKHS) and operator theories. To better prepare for such a task, we 
summarize some preliminary results in Section [2] and provide an outline of either a proof or an 
understanding for each result. In Section [31 we give a few mot ivat ing examples, wherein the 
dimension reduction functions fall out of the space required in I Jiang et ah (2014) and hence 
cannot be solved under their model. We then present an extension of the functional dimension 
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reduction model in Section |4j together with some main results. Onr extension works on an 
enlarged space, so that the classical notion of SIR in standardized scale can be carried out. 


2 Preliminary 


2.1 Covariance operators and integral operators 


Without loss of generality, we restrict our attention to functions defined on / = [0,1]. Let the 
Hilbert space L 2 (I) be the space of functions defined on / and equipped with inner product 
given by 

(u,v)l 2 = / u(t)v(t) dt, u,ve L 2 (I). 


Jo 

Let r(s, £) be a continuous bivariate function on / x /. Then T(s,t) induces a linear integral 
operator, still written as T(s,t), where its operation on a function u(-) G L 2 (I) is defined as 


(Ru)(s) = I T(s,t)u(t)dt = {T(s,-),u(-)) L2 for u G L 2 (I). (1) 

Jo 

When ( u , Tv)l 2 = (Tit, v) l 2 for all u , v G L 2 (I), T(s, t) is said to be a symmetric linear integral 
operator. Note that 


1 /■! 


(u,Tv ) L2 = 

(r u,v ) L2 = 



'0 Jo 

"1 rl 



u(s)T(s, t)v(t) dt ds, 
u(s)T(t, s)v(t ) dt ds. 


0 Jo 


Hence, as long as T(s,t) is symmetric as a function of (s,t) dehned on / x /, its induced 
operator T(s,t) is also a symmetric operator. When (u,Tu)l 2 > 0 for all u G L 2 (I), V is said 
to be positive semi-definite (or non-negative definite). When the equality holds if and only if 
u — 0 a.s., T is said to be positive definite (or strictly positive definite). A positive (semi-) 
def ini te linear integral operator is also known as a covariance operator. Let B denote the unit 
ball in L 2 (I), i.e., B = {/ G L 2 (I) : ||/||l 2 A 1}- An operator V dehned on L 2 (I ) that maps 
to L 2 (I) is said to be compact if the image of the unit ball, r(£>), is a compact set in L 2 (I). 

Let X(t), t G I, be a random process with hnite second moments and Y be a univari¬ 
ate random variable. We now consider three specific bivariate functions and their induced 
operators, 


T(s,t) =cov{X(s),X(t)}, T w (s,t) = A[cov{X( S ),X(f)|H}], 


and 


r e (s,t) = cov[E{X( S ) | Y},E{X(t) | Y}]. 
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It is easy to verify that T(s, £), T w (s,t) and T e (s, £) are all symmetric bivariate functions and 
T (s,t) = T w (s,t ) +T e (s,t). We further assume T(s,t), T w (s,t), T e (s,t) to be continuous. The 
continuity of functions T(s, t ), r w (s, t ) and T e (s, t ) on / x I implies they are square integrablc, 
and hence the continuity guarantees that, T(s,f), T w (s,t ) and T e (s,f) are compact operators 
on L 2 (I) Lax ( 2002 1 (Chapter 22, Theorem 4). The definitions of Th^t), T m (s, t) and T e (s,f) 


also ensure that, they are positive semi-definite. Mercer’s Theorem Lax| (120021 1 (Chapter 30, 


Theorem 11) then implies that they have discrete spectra. Taking T(s,f) for instance, it can 
be expanded in a uniformly convergent series of eigenvalues and eigenfunctions 


r(s, t) = &&(«)&(*), q < °o, 


( 2 ) 


i=l 


which we sometimes write in short as 




iVi 'Si (fri • 


i =1 


Here > £2 > • • ■ > £, q > 0 are decreasing positive values. If T(s, £) is strictly positive 
definite, then q = 00 and {0i(-)li=i form a complete orthonormal basis for L 2 (/). The above 
result critically relies on the strictly positive definiteness of T. Without the assumption of T 
being strictly positive definite, we can still decompose T(s, i) as in (J2J) , and the corresponding 
9 < 00 , always form a complete orthonormal basis for i?(T), the range of T. 
However, R(T) C L 2 (I), when T is not strictly positive definite. We further outline the 
following results which are relevant to the functional inverse regression study. 


Proposition 1. A continuous, symmetric, positive (semi-) definite integral operator T(s,t) = 
Yli=\ (i ( t } i{ s ) ( t } iif) is a trace-class operator, i.e., 

q 

< °o. 

i=l 

Proof. Because T(s, t) is a continuous function on / x J, for s — f, f(t) = T(t, t) is a continuous 
function of t in /, thus is integrable. Hence, — f f(t)dt < 00 . □ 

Proposition 2. For any positive (semi-) definite operator T(s,t) = Yli=i there 

exists a mean zero random process X(s) satisfying f E{X 2 (s)}ds < 00 such that T(s, t) = 
cov{X(s), X(t)} and 

q 

X{s) = ^Aififis), 

i =1 

where Ai’s are independent random variables with mean zero and variances ’s. 
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Proof. For i = 1, 2,..., q, let Ai — Z tJ where Zfs are independent standard normal random 
variables. Obviously the resulting X(s) is a mean zero process that satisfies cov{A"(s), X(t)} = 
T(s,t). In addition, f E{X 2 (s)}ds = Y^=\ & < oo. □ 

Note that, in our construction of the Gaussian process in the above proof, the sample path 
X(-|ca) may not be in L 2 (7) for a given realization u. However, f E{X 2 (s)}ds < oo ensures 
that the probability of this kinds of u is 0. That is, A"(-|o;) G L 2 (J) almost surely. In the 
following, we may simply use X G L 2 (J) to denote that A’(-|u;) G L 2 (J) almost surely. 

2.2 RKHS relevant for functional inverse regression 

Let "H r be the RKHS generated by T(s,t). Specifically, 

q 

"Hr = closure j T(s, U)ai : q G N, a* G M, U G [0,1]|, 

i= 1 

where the closure is taken with respect to the norm induced by the following inner product 

Note that "Hr is a proper subset of L 2 (J). For / G "Hr C L 2 (/), / has the expansion 

f(t) = -/'hA(0t where ./■ = (/, &) La . 

i 

In addition to its L 2 -norm defined as ||/||l 2 = Yhifh ^ ie Lfr-norm is given by 

n/iik = Ef 

i Si 

For u, v G "Hr, the Ffr-inner product is given by 

{u,v) Hr = ( 3 ) 

i Si 

where u(t) = u(f) = 


3 Motivating Examples 


Throughout our development of a rigorous framework for functional inverse regression, we set 
up a space, 


R ( r 1/2 ) = < / : / = Mi’ fi eR such that < 00 f 2 L 2(I), 


i= 1 


5 



which is the range space of the operator W 1 / 2 and is larger than L 2 (I). Below we give a few 
examples, wherein the dimension reduction functions fall out of L 2 (/) and reside in R(r -1 / 2 ). 
These examples motivate us to consider an enlarged space for functional dimension reduction. 
Intere stingly, this enlarged space -BIT -1 / 2 ) is the space consid ered by Grenander iGrenander 
()1950l) and Rao and Varadaraian lRao and Varadaraianl (119631) in the study of linear discrim¬ 
inant analysis of Gaussian measures on a separable Hilbert space. 


Example 1 (Binary response). Let Y be a binary random variable having probabilities P(Y = 
1) — P(Y — — 1) = 4, and let be a complete orthonormal basis for L 2 (I). Given Y = y , 

consider 

x v(t) = ay'52v+sA(t) + '^2-z i i/> i (t), tel, 

% % 


where 0 < 5 < 1/2 and a is some scalar that controls the separation of two groups. Here Zf s 
are independent standard normal random variables that are independent of Y. Let r e be the 
between-group covariance and be the within-group covariance. Then T = Tg + T^. We can 
easily calculate the within-group covariance function as 

1 

T w {s,t) = E [cov{X(s),X(f)|F}] = 

i =i 1 

and the between-group covariance function as 


r e (s,t) = cov[{^ X! {“ y E 

2—1 2—1 


= a 


CXJ CXJ 

2—1 2—1 


Proposition 3. The following two optimization problems 


argrnax 

P 


{T e ( 3 ,( 3) L2 

(TP,P)l 2 


= argrnax 

P 


(r eP,P)L 2 
MP )l 2 


have the same solution /3 given by 

oo 1 

P(t) = 

L 

2—1 

for any constant c. 


( 4 ) 


Proof. From T = T e + T^, we have 

(T/3,/3) L2 (T e /3,/3) L2 + (r^,/3) L2 (T w /3,/3) L2 

(r e/ 5, p ) l 2 <r e /3,/?>x 2 <r e /3,/3>^ 2 ‘ 
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Therefore, 


MP)l 2 _ (T e p,P) L2 

ar T x <m/3> L2 


Let /3 = ]TE Then 


Te/3 

(T e ^^)n 2 
r w /3 
(r w /3, P)l 2 


= a 


= a 


Esi< E • 


, 2 — 1 
OO 

E- 

J=l 


OO 7 

E£*. 

2=1 

00 h 2 

e|- 

*=1 * 


, 2=1 


2+<5 


Therefore, the optimization problem becomes to maximize 

« 2 (E,”if>i/i 2+S ) 2 


ESi f>; 2 /i ! 


From Cauchy-Schwarz inequality, 


Ej 

, 2=1 


2+5 I ~ 


b? 


Ei E; 


2+25 


The equality holds when bi oc 1 /i 5 , which 


s. 2=1 


means 


, 2=1 


P(t) °c 


2=1 


is the maximum eigenfunction. □ 

The dimension reduction function f}(t ) is obtained from solving the eigenvalue problem 
Y e f3 = AT/?. The corresponding optimal linear classification rule is via 

sign (((3, X)l 2 ) ■ (5) 


This result can be linked to some prior study of linear discrimin ant a nalysis of two Gaus¬ 
sian measur es on a separable Hilbert sp ace by Grenander iGrenanderl (119501) and Rao and 
Varadarajan Rao and Varadarajanl ( 19631 ). Let 


m y {t) = E{X(t)\Y = y} = uy^^Ait) = AJ, 2 ( ay^—^A ) (t) G R(T^ 2 ). 


2=1 


2=1 
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Note that (3 given in (J3|) is not in L 2 (J), but in i^r^ 2 ), 


since 





< oo. 


We also have ||r^ 1 ^ 2 m, ; ||| 2 = a 2 < oo, i.e., m y is in R(Y\l 2 ). Furthermore, from 

Proposition [3] and its proof, we have Y e f3 = cpm y , where c\ = cayY^Li l/(2 2+2<5 ), r w /3 = c 2 m y , 
where C 2 = c(oty)~ l . Therefore 

(r 1/2 /3, T 1/2 /3) L2 = (r/3,/3) i2 
= ((T w + r e )(3, P) L2 = ( v w p , P) Li + <r e /3, 13) L2 
= l|ri/ 2 /3||| 2 + cic 2 (m y , T- l m y ) L2 

= l|ri/ 2 /3||| 2 + c 2 ^ \\r-^m y \\l 2 < oo. 

Hence, /3 G RfT- 1 / 2 ). 

This is an example that X G L 2 (/), f3 G -R(T -1 / 2 ), but f3 ^ L 2 (J), and the classification rule 
sign ((/3, X)l 2 ) is well-defined. This indicates that, to solve for a linear discriminant analysis 
problem in L 2 (I), we cannot restrict f3 to L 2 (I). We are obliged to enlarge the domam of (3 
to i?(r -1 / 2 ). On the other hand, requiring (3 G i?(T -1 / 2 ) is indeed sufficient for the purpose of 
linear discriminant analysis given in for classifying the observations into two groups. 


Example 2 (Categorical response). The feature revealed in Example 1 is not unique for binary 
response variable Y. When the response variable Y is categorical, similar phenomenon can be 
observed. For example, consider the case, where the response variable Y is categorical with 
possible values y i,..., yu- We normalize the y values so that Y has mean zero and variance 1. 
Let 

OO ^ OO ^ 

Xy{t) = + ^ 2 ~ Ziifiit), tel. (6) 

'l X 

2=1 2=1 

We can easily verify that the within-group covariance function is 

oo 1 

T w (s,t) = E [cov{X(s),X(t)|T'}] = '^2—'if i (s)if i (t), 

X 

2=1 


and the between-group covariance function is 







Let r = r w + r e . Note that the forms of T w (s,t) and T e (s,t) here are exactly the same as 
those in Example 1. Thus, when we perform the functional sliced inverse regression by solving 
for the first eigenfunction, 


Pi 


= argmax 

V 


qy?,/3 )l 2 

(rw3,/W 


we have exactly the same analysis as that in Example 1. It then leads to the same conclusion. 
That is, we are obliged to enlarge the domain of f3 to .RfT^ 1 / 2 ). On the other hand, requiring 
/3 E if(T -1 / 2 ) is also sufficient for our purpose of classifying the observations into k groups. 


Example 3 (Continuous response). Finally we provide an example with continuous response 
variable Y. Let Y have mean zero and variance 1, and let 

X ^ oo ^ 

X y (t) = ay^-^^ i (t) + ^-Z i ^ i (t), tel. (7) 

1=1 1=1 

We can easily verify that the within-group covariance function is 

OO - 

T w (s,t) = Ecav{X(s), X(t)\Y} = ^ —^(s)^(t). 

*=i 1 

The between-group covariance function is 

{ 1 OO ^ 

; i 1 . i 1 

OO -| OO 1 

1=1 1=1 

Let T = T w + T e . Now the same analysis as that in Examples 1 and 2 leads to the conclusion 
that, regardless of how many slices one decides to use, (3 is in i?(T _1 / 2 ). 


4 Enlarged dimension reduction space and main results 

In this section, we present our main results. First, we establish in Theorem |T] an interesting link 
between covariance operators on L, 2 (I ) and on "Hr- Next, we extend the functional dimension 
reduction to a relaxed model with enlarged space given in (fTUj) . The reproducing kernel Hilbert 
space "Hr, induced from the covariance operator T, defines a proper range space for the sliced 
mean (see Proposition |6] and Theorem [2]( a) below). It also plays the parallel role as the span 
of A" in finite dimension (see Proposition [HI) • Note that, "Hr is equipped with an inner product 
(•, -)n r - Interestingly, this inner product refers to the standardization (see equation (J3]) above 
and equation (fill) below) similar to the Mahalanobis distance and the standardization by the 
covariance matrix in finite vector case. We also study the linear design condition under the 
relaxed model in Proposition [T] 
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4.1 Bounded operators on L^{I) and on Pip 

Theorem 1. Assume T and T e are continuous, and respectively strict positive definite and 
positive semi-definite. Then, r -1//2 r e T -1 / 2 is a well-defined bounded linear operator on L 2 {I) 
if and only ifT e is a well-defined bounded linear operator on Pip. 

Proof. Let h = JT dfi. Then, 

lir- 1/2 fr|lL = ||r- 1 / 2 £c*(.)UL = = ||fc|^ r . (s) 

i i 

That is, 

V~ 1/2 h G L 2 (I) ^he Pip. (9) 

For any g G L 2 (I), there exists h = T l ^ 2 g G Pip. Then 

Hr- i / 2 r e r- 1 / 2 « 7 || L2 = = ||r e /i|| Wr . 

Together with ([HD, we have 

Ur-iAr.r-v^iii, ||r«&||„ r 
llsllL W\l r ’ 

which yields the statement of the theorem. □ 

Remark 1. From r~ 1//2 is bounded when it is defined as a linear operator from Pip to 
L 2 (I). Piere boundedness is referred to its induced operator norm, supj gWr ||r~ 1//2 /||| 2 /||/||^ r < 
oo. Piowever, when it operates on f G L 2 {I), T -1 / 2 / may not belong to L 2 {I). For example, 
r -1/ Vi(t) = fi ll/2 (j)i(t), hence ||r~ 1 / 2 0j||| 2 /||(/>j||| 2 = ^j -1 — > oo, as i —>■ oo. When com¬ 
bined with the additional covariance operator T e , Theorem Q] ensures the resulting operator 
T~ l / 2 T e T~ l/2 'Is a bounded linear operator on L 2 {I), i.e., T~ l ^ 2 T e T~ 1 ^ 2 '■ L 2 (I ) (->■ L 2 {I ) is a 
well-defined bounded operator. Note that L 2 (I) is a much larger space than Pip. Thus, the 
new operator composed of the three operators can be well-defined in a larger domain than the 
original operator T -1 / 2 can. 


4.2 Relaxed model and extended estimation 


W e are now in a p osition to revisit the functional dimension reduction problem studied 


m 


Jiang et ah (2014), describe the problem more rigorously and extend it. Let Xfit), t G /, be 


a stochastic process satisfying E f X 2 {t)dt < oo. Denote its covariance function and spectrum 
by 


r(s,t) = cov{X(s),X(f)} = 

2—1 
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Then, X can be expressed by an expansion as 

OO 

X(s) = ^ 

2—1 

where /Vs are independent random variables with mean zero and variances £j’s. Below we 
give a Proposition, which ensures that we can exchange the order of double integrals. 

Proposition 4. 

E(X, 4>i) L2 = (E(J 0,^>l 2 . 

Proof. From Cauchy-Schwarz inequality, we have 


E / \X(s)(pi(s)\ds < E 


1/2 / r \ 1 / 2 ' 
X\s)ds ) / tf(s)ds 


= E 


X 2 (s)ds 


1/2 


1/2 / r \ 1/2 

I ^ ( TT l V"2 / 


From Jensen’s inequality, 

E (^J X 2 (s)ds) < \^E J X' 2 (s)dsj < oo. 

Thus, with if f |X(s)0j(s)|ds < oo, we can apply Fubini’s Theorem and get 

E j X(s)4>i(s)ds = j E [X(s)](pi(s)ds. 

Our proposed model is 

Y = f((/3 1 ,X) La ,...,(/3 d ,X) L2 ,e), where /?(•) £ i?(r -1 ^ 2 ). 


□ 


( 10 ) 


Note that a critical difference of our formulation here from that in Jiang et ahl ( 20141 ) is that, 
we only require /3 to be in fiV” 1 / 2 ), which is larger than L/ 2 (I). This extension allows more 
flexibility in the dimension reduction functions. 

Proposition 5. For (3 € i?(r -1 / 2 ), (/3,X)l 2 is well-defined almost surely. 

Proof. Let 5 = T 1 / 2 /? G L 2 (/) and Si = (J, V)l 2 - We have 

E({/3,X) L2 f 

2 

= 5^ = XI = ihiL < . 

2 2 

which implies that |(/3,X) i2 | < oo a.s. □ 


11 









Remark 2. Proposition 0 reveals an interesting result regarding the space where X belongs 
to. The finite second moment condition is commonly used in statistical analysis. In the finite 
dimensional case, a random vector with finite second moment can have arbitrary variation 
for each component of the random vector, hence the random vector can take values in the 
entire space. However, this is not the case in the infinite dimensional functional space. To 
ensure finite integrated variance, a random function cannot have arbitrary variation along each 
dimension. In fact, the variations along all dimensions, except a finite set of dimensions, have 
to degenerate sufficiently fast to guarantee finite total variant. In fact, the set of dimensions in 
which almost all variation accumulate is fixed for a single random function. As a consequence, 
the random function cannot take values everywhere in 1,2(1). This is why the resulting space of 
the random function X is in fact a much smaller subspace of L 2 (I). A feature of this subspace 
is that it ensures finite inner-product with elements in R(where T is the covariance 
function of X. We define this space as 

R{ r 1/2 )+ = {/:</, P) i, < oo, a.s. V/? € fllT- 1 / 2 )}. 

Obviously R(T^ 2 ) C i?(T 1//2 ) + C L 2 (I). We will encounter this space again when we present 
an equivalent linearity condition later in Section \f.3\ Note that although a single random 
function X belongs to a much smaller space -RfT 1 / 2 )' 1 ", the (uncountable) union of all such 
spaces of all random functions is the entire L 2 (I). 

Remark 3. For any f G -RfT 1 / 2 ), Proposition [5] ensures that the quantity (W 1 f, X)l 2 is 
well-defined a.s. It is easy to verify the identity 

<r - 1/2 f, r-v 2 x) L2 = (r- 1 /, x) L2 = </, x) Hr . (n) 

In the classical SIR, the main problem can be viewed as solving the eigenvalue problem ofT e in 
the space scaled by T -1 / 2 . Now in Functional Sliced Inverse Regression (FSIR), (Q7J) indicates 
that T ” 1 / 2 can be again viewed as the scaled operator from L 2 (I ) to Fir- 

In fact, the relaxed model leads to more flexible requirements on subsequent operators 
needed in the estimation procedure, which in turn leads to less stringent conditions on quan¬ 
tities such as mean covariates conditional on the response, etc. For example, in the FSIR 
approach, we would search for f3 from the functional eigenvalue problem 

r e /3 = XT/3, ( 12 ) 

where T(s,t) = cov{A"(s), A"(f)} as before, m Y (s ) = E{X(s ) | Y } and 

T e (s,t) = cov[£'{A(s) | Y}, E{X(t) | Y}] = cov{my(s),my(f)}. 

Letting 77 = r 1</2 /3 G L 2 (I ), rewriting dT2l) as 

r- 1/2 r e r ~ 1/2 ?7 = r~ 1/2 r e r- 1/2 (r 1/2 /3) = A(r 1/2 /?) = a??, 
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we would naturally require T 1//2 T e T 1 / 2 to be a well-define d operato r from £ 2 ( 1 ) to L 2 {I). 


However, T 1 t 2 T e T is restricted to operate on i?(T 1//2 ) in i Jiang et al.l 020141 ). This restric¬ 


tion, comes naturally from their condition that /3 G £> 2 ( 1 ), leads to a conclusion that the slice 
mean can only be in a restricted space i?(T) (Theorem [21(b) below) instead of in the space 
^(T 1 / 2 ). In Theorem [2](a), we show that our relaxation on the domain of T 1 / 2 T e T 1//2 leads 
to a more flexible condition on the conditional mean functions my(s). 

Here, we first state a useful result in Proposition [6] 


Proposition 6. i^T 1 / 2 ) = Hr- 

Proof. A function g G i^T 1 / 2 ) is equivalent to g = r ly/2 /z and h G L 2 {I). Now 


llsllh = l|r 1/2 ft||« r 

= llEd / A(s)(*(*).ft(*)>ll« r 

i 

= II^K, 1/2 W>i( ( ). ft M)} 2 /6 

i 

= \Ml 2 . 

Thus ||( 7 ||f^ r < 00 is equivalent to \\h\\f 2 < 00 , hence g G i?(T 1//2 ) is equivalent to g G "Hr- □ 

Remark 4. Proposition^ implies that, if m y G Hr, then m y G RifT 1 ^ 2 ), and thus Y~ l m y is in 
R(T~ l R). In those examples in Section 0 we have shown that the relaxation from /3 G L 2 {I) 
to (3 G i?(T~ 1 / 2 ) is crucial and that the condition (3 G i?,(r~ 1 / 2 ) is sufficient for (/3 ,X)l 2 being 
well-defined a.s. Furthermore, from the proof of Proposition^ we have 

\(f>,m y ) L ,\ = \tfi,E(X\Y = y)) L ,\ = \E([B,X) L ,\Y = y)\ < [E(tfi,,X)\ 2 \Y = y )] 1/2 < oo, 

which means that m y G i^T 1 / 2 ). Therefore, our relaxed condition on (3 is sufficient to include 
all possible m y . In fact, it is also necessary since for any proper subset Ll C .R(r~ 1 / 2 ), there 
always exists some m y so that the optimal f3 = T~ l m y H. 

Theorem 2. Let Y take values in a discrete finite set, say {1,..., k}, with equal probability. 

(a) //T-^TeT - 1 / 2 is a bounded operator from L 2 (I) to L 2 (I), then m y G i^T 1 / 2 ). 

(b) Alternatively, if r - 1 ' 2 r e r -1 / 2 is a bounded operator from R(T 1//2 ) to R^T 1 ^ 2 ), then rn y G 
R(T). 

Proof, (a) From Theorem [U if T _ 1 // 2 r e r _1//2 is a bounded operator on L 2 (I), then T e is a 
bounded operator on Hr, which means h = T e gE Hr for any g G Hr- Thus, 

^ k 1 k 

h = T e g = - ^2m y ® m T y g = - ^{m y , g) L2 m y . 
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In order for the above function to be in Ur for arbitrary g G "Hr, m y s have to be in ' Hr■ 

(b) For an arbitrary g G /^(r 1 / 2 ), gi = Y~ l / 2 g is in L 2 (/). Since h = r~” 1 // 2 r e r~ 1//2 g is in 
iRT 1 / 2 ), we have r 1 / 2 /! is in i?(r). Furthermore, Y l / 2 h can be expressed as 

r 1/2 /i = Y e Y~ 1/2 g = -j:^2m y ^m^gi = g x ) L2 m y . 

y =i y =i 

In order for the above function to be in R(Y) for arbitrary function gi G have to 

be in R(Y). □ 

We now examine how the formulation will affect the estimation procedure. Assume a 
discrete Y for simplicity. The j th slice mean function is given by 

mj(t) = E{X(t)\Y — j} = J2E(A i \Y = j)<t> i {t). 

i 

Following Theorem [2] and Proposition [6j rrij is in RKHS Ur- Assume in the j th slice, we 
have observations V 3 = {A U{T), where Y] = j and we consider two types of T. One is 

T — I, i.e., we observe the whole sample paths, and the other is T = {t k } q k=1 , q < oo, i.e., we 
observe X t {t) at some common discrete time points. 

Theorem 3 (Representer Theorem). Given the j th slice training sample T> 3 with T = {t k } q k=1 , 
an arbitrary empirical risk function Q : M 2 t—)■ R and a scalar C > 0, consider the following 
minimization problem 

Uj 

argmin^^Q{W(f),m(f)} + C\\m\\ 2 Hv . (13) 

me «r i=1 teT 

Then the solution of the minimization problem exists and has the representation form 

q 

rhj(s) = ^Y(s,t k )a jk , t k eT,a jk eR. (14) 

fc=i 

Proof. For any function m(s) G "Hr, it can be expressed as 

q 

m(s) = y^r(s, t k )a k + i/(s), 

k =1 

where u(s) is in "Hr and orthogonal to every Y(s,t k ), j — 1,... ,k. By the reproducing property 
of Ur, 

q \ q 

r(s, t £ ), Y, r(g, t k )a k + v(s) ) = y^r(^,4) a k , £ — 1,... ,q, 

k =1 / Hr k=1 
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which does not involve z/(s). This implies that the empirical risk function Q in (fT3]i also does 
not involve z/(s). Since 


YA*,tk)<*k + ^( s ) 


fc=i 


2 

"Hr 


Q 

y^T (s,t k )a k 

k =1 


2 

+ 

Hr 


Z/ 




the regularization term in (fj~3l) is minimized by z/(s) = 0. Therefore, the minimizer takes the 
form rhj(s) = J2l=i r ( s > □ 

From the proof of Theorem [3l we can see that the role of C'||m||^ r in (fT3l) is to force v 
to be zero and hence to guarantee a unique solution of the minimization problem. If we set 
C — 0 , v can be chosen freely as any function orthogonal to T(s, i^)’s and it will not affect the 
target value in (TT3]h This freedom occurs because {T(s, tk)} q k=1 do not span the whole "Hr- If 
such freedom vanishes, for example this happens when the entire Xj(t)’s are observed, then 
we no longer need to have the term to induce uniqueness. An additional utility of 

the “penalty” term C'||m||^ r is to regularize the solution. It provides a balance between the 
best data fit evaluated by the risk function and the variability of the solution. 


Remark 5. If we modify the minimization problem |73|) by restricting the residing space of 
the slice mean to a smaller subspace R(T), 

Tlj ^ 

argmin VVO {X^t), m(t)} + C\\m\\ 2 Hr . (15) 

meH(r) - =1 tgT 

then the representation form JZi might not be valid anymore. 

Remark 6. When the observations are the entire paths, i.e., T = I, we can choose C = 0 
and modify the minimization / fT31) to 

n j 

argmin y ^Q(A f,m), (16) 

mGHr 

where now Q is a bivariate risk functional. A typical bivariate risk functional is the quadratic 
one, i.e., Q(f, g) = (A(f - g), f - g) l 2 , where A is a symmetric strictly positive definite linear 
integral operator with Q and ifi (£ = 1, ...,ooj as its eigenvalues and eigenfunctions. In 
this case, Q (f,g) = Ya=i Oift ~ 9ef for f = YleLiMe and 9 = T.T=\ 9eh.- Write X t as 
Xi = JfrLi x ifdh. and m = Yl'eLi "hVk- Then 

TLj rij oo OO 71 j 

Y Q i x » m ) = Y Y ~ m f) 2 = Y & ~ m ^ 2 - 

i= 1 i=1 1=1 1=1 i= 1 

The above term is minimized when rri£ = Yi=i x ^/ n j f or °T That is, the mean path is the 
minimizer of m- 


15 














Remark 7. With a given covariance estimator, the slice means can be expressed as a linear 
combination of covariance functions at training data points, as presented in O- When the 
covariance estimator is given, the estimation of slice means becomes less challenging. The most 
difficult part of estimation in FSIR is the estimation of covariance operator. High-dimensional 
covariance estimation is a difficult problem, and the functional case is even more challenging. 
Our aim here is to set up a right framework for the functional inverse regression in an enlarged 
space. Therefore, we do not further discuss the estimation of the covariance operator. 


4.3 Linearity condition re-expressed 

Recall that SIR requires a linearity condition, which, in the functional dimension reduction 
framework, is written as the following: For any b £ i^T^ 1 / 2 ) there exist cio, aq,..., a*, £ R. 
such that 

k 

E ((b,X) L2 m,X) L2 ,..., ((3 kl X) L2 ) = a 0 + Y ] (17) 

3 =1 

where /3 1; ..., /3 k £ i^T^ 1 / 2 ). Below we give a more direct linearity condition statement, which 
is equivalent to the one given by (fTTlh 


E(X(s)\(fh,X) L2 ,...,(p k ,X) L2 ) 


is linear in (/ 3 t , X) L2 ,..., (f3 k , X) L2 , Vs £ /, 


where f3\,...,/3 k £ R(T 1 / 2 ). That is, there exist Gq(-)'s £ i?(T 1 / 2 ) + such that 


k 

E (X(s)\((h,X) L2 ,..., (P k ,X) L2 ) = ao(s) + Y, Vs £ I. (18) 

3 = 1 

Remark 8. It is easy to check that i?(r 1 / 2 ) + C L 2 (I). Since the functions a 3 ’s in TTS I) should 
belong to the same space where X resides, they belong to R( r 1//2 ) + based on Proposition [3 
which is smaller than L 2 (I). This fact about aj ’s is masked when 0 is used to describe the 
linearity condition. However, we can see that this condition on aj’s is indeed necessary and 
sufficient from the following proof of Proposition [?| 


Proposition 7. The two versions of fwictiojial linearity condition given in (f77p and HT8\) are 
equivalent. 


Proof. Assume (TT7T) holds. Consider the evaluation functional E S (X) = A"(s). Let b s (-) = 
X)i0t(s)0i(O- Since ||r 1/2 & s (-)||| 2 = = r («,s) < oo for any s, we have b s (-) £ 

i?(r -1 / 2 ) for any s. Obviously 


(b s ,X) L2 =E s (X) = X(s). 


(19) 
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Thus, 


E{X(s)\([h,X) L2 ,...,(p k ,X) L2 } = E((b s ,X) L2 \<J3 1 ,X) L2 ,...,(l3 k ,X) L2 ) 

k 

= a 0 (s) + ^^aj(s)(Pj, X) L2 . (20) 

3 =1 

By Proposition [5] we have X G i?(T 1//2 ) + . It is then easy to see from the identities (1201) that 
a/'s are in i?(r 1/2 ) + . Hence (fTHj) holds. 

On the other hand, assume (TTBTi holds, 

k 

E (X(s) m,X) L2 ,..., (p k ,X) L2 ) = a 0 (s) + J2a j (s)(^,X) La . 

3 = 1 

Now for any b(s) G i?(r -1 / 2 ), take inner product with the above two sides, we obtain 

k 

E ((b,X) L2 m,X) L2 ,..., ((3 k ,X) L2 ) = (Mo>l 2 + ^(6, % ) L2 (^-,X) i2 . 

3 =1 

Since b G i?(T -1 / 2 ) and cij G i?(r 1//2 ) + , ( b,af)L 2 < oo. Therefore, (TT7|) holds. □ 


Remark 9. We provide a neat expression (Q3) for the linearity condition. In the classical 
SIR, the corresponding condition of m is: For any b G P, there exist a 0 , a ±,..., a k G M 
such that 

k 

E (b T X \ftX ,..., /3jX) = a 0 + a j(3j X. 

3 = 1 

The corresponding condition of fJR) is: There exist dj’s in such that 


E (X|/3^X,..., /#X) =a 0 + Y J 

3 =1 

They are equivalent by similar arguments above. Interestingly, such equivalence description 
of the functional linearity condition seem s only possible when we allow f3 G /^(r -1 / 2 ). In the 
original framework of J ian g et al. 12014) . where f3 is required to be in L 2 (I), we are unable 
to obtain such equivalence description, as the representation function b s (-) in m for the 
evaluation functional IF S is in i?(T -1 / 2 ) but not in L 2 {I). 


5 Conclusion 

We have described an extension of the dimension reduction models to the functional data 
framework. Our extension is based on careful and rigorous considerations in operator theory 
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and functional analysis. We mainly focused on generalizing concepts in the classical dimension 
reduction problems into the new framework and on enlarging the functional space of the 
reduction function fj. We found some interesting examples where such increased flexibility is 
indeed needed, and we discovered an equivalent expression of the popular linearity condition. 
While our analysis is based on FSIR, we believe similar analysis can be applied to other 
functional inverse regression based methods. It will be interesting to study how other methods 
in the classical dimension reduction models can be properly extended to the functional data 
framework. 
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